home *** CD-ROM | disk | FTP | other *** search
/ Developer Source 7 / developer source - volume 7.iso / dobbs / mar97 / singf106.gif < prev    next >
Graphics Interchange Format  |  1997-06-26  |  26.0 KB  |  244x290  |  4-bit (16 colors)
   ocr: t+1 1+2 au 8t+1 8t+2 St+ L St+2 Figure 6: The program's. expertence consists ofa trajectory through state space. Al - time stept, the state iS S, and tbe agent faces a choice ofactions. Note tbe action the agent cbooses to execuse at. stept isa. The rewardat stept, Reward,, isafunction ofst anda, Ihe next state Si+1 depends ons ar and mary random events such as passengers arriving atfloors anapushing buttons. Reinforcement learning allous 3 program to se such - a trajectoryto incrementally improve its policy.